Theory and Practise of Monotone Minimal Perfect Hashing
نویسندگان
چکیده
Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, order-preservingminimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable (n log n) lower bound on the number of bits required to store the function. Recently, it was observed [1] that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. We refer to this restricted version of the problem as monotone minimal perfect hashing. We analyse experimentally the data structures proposed in [1], and along our way we propose some new methods that, albeit asymptotically equivalent or worse, perform very well in practise, and provide a balance between access speed, ease of construction, and space usage.
منابع مشابه
Monotone minimal perfect hashing: searching a sorted table with O(1) accesses
A minimal perfect hash function maps a set S of n keys into the set { 0, 1, . . . , n− 1 } bijectively. Classical results state that minimal perfect hashing is possible in constant time using a structure occupying space close to the lower bound of log e bits per element. Here we consider the problem of monotone minimal perfect hashing, in which the bijection is required to preserve the lexicogr...
متن کاملPerfect hashing using sparse matrix packing
This article presents a simple algorithm for packing sparse 2-D arrays into minimal I-D arrays in O(r?) time. Retrieving an element from the packed I-D array is O(l). This packing algorithm is then applied to create minimal perfect hashing functions for large word lists. Many existing perfect hashing algorithms process large word lists by segmenting them into several smaller lists. The perfect ...
متن کاملUsing Tries to Eliminate Pattern Collisions in Perfect Hashing
4any current perfect hashing algorithms suffer from the problem of pattern collisions. In this paper, a perfect hashing technique that uses array-based tries and a simple sparse matrix packing algorithm is introduced. This technique eliminates all pattern collisions, and because of this it can be used to form ordered minimal perfect hash functions on extremely large word lists. This algorithm i...
متن کاملA Survey on Efficient Hashing Techniques in Software Configuration Management
This paper presents a survey on efficient hashing techniques in software configuration management scenarios. Therefore it introduces in the most important hashing techniques as open hashing, separate chaining and minimal perfect hashing. Furthermore we evaluate those hashing techniques utilizing large data sets. Therefore we compare the hash functions in terms of time to build the data structur...
متن کاملPerfect Hashing for Data Management Applications
Perfect hash functions can potentially be used to compress data in connection with a variety of data management tasks. Though there has been considerable work on how to construct good perfect hash functions, there is a gap between theory and practice among all previous methods on minimal perfect hashing. On one side, there are good theoretical results without experimentally proven practicality ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009